If you’ve read other blog articles by BioLingo, you may have noticed a common theme when discussing machine translators. Simply put, they are not safe for medical translation. They can be tempting in a pinch, but seriously… don’t use them. They can lead to and have led to lethal medical errors, and its crucial to remind ourselves to put patient safety front and center before beginning a broader philosophical discussion about this growing technology.
Now that we got that out of the way, let’s take this opportunity to explore the potential of machine translation to an extent that we so far have not.
In February, Facebook’s parent company Meta announced it has plans to revolutionize machine translation. They intend to do so by solving some of the major issues that face this technology at an undefined point in the future. In a press release, the first problem they mentioned was how languages that are less widely-spoken offer those who speak them much less access to the World Wide Web. Speakers of more common languages like English, Mandarin, or Russian might think the internet is already sufficiently multilingual, but people who speak Amharic or Kazakh without a major language to fall back on will find the internet a very limited place by comparison.
In their announcement, Meta said they have two major goals toward achieving greater representation for smaller languages. They want to be able to create effective machine translation with far fewer sample texts, since languages that are not widely spoken have smaller libraries of text to pull from. Furthermore, they want to become better at skipping those written texts all together, interpreting directly from verbal sources without an intermediary of text. They even claim they would like their simultaneous machine translator to match the intonation and inflection of the speaker to better convey emotion, an ability that barely exists at present.
How is Meta going to do this? It’s hard to understand without going down the rabbit hole of how machine translation works. To better explain how this technology is evolving, it may suffice to have a peek in the rabbit hole, if only briefly. Early machine translation was little more than a dictionary attached to a basic algorithm that translated phrases word-for-word, creating nonsense sentences in the target language. The next iteration incorporated some grammar and syntax for modestly better results. Finally, the creator of Google Translate, Franz Josef Och, realized that statistical machine translation which used AI to study thousands of pages of text could detect patterns and language conventions that were nearly impossible for humans to code, and thus created a much more natural final product. One important quirk of this process is that it translates everything into English first, using it as a default language and an intermediate step between the source and target languages.
Today, statistical machine translation is the norm, and it is the same principal that Meta is starting with. According to them, the biggest challenge to translating for languages that are less-widely spoken is that there are fewer sample texts. Their proposed solution to this is an open-source data toolkit called LASER. LASER’s purpose is to be able to analyze a language using far fewer sample texts. It is also designed to be able to use spoken language as a source. This opens the door to languages without a formal writing system. LASER also seeks to translate between languages without a default language in the middle, the way English is used in Google Translate. Thus, by involving the spoken word more completely, LASER will be able to make spoken language faster and easier to translate, and incorporate languages currently being neglected.
It’s hard to discuss Meta’s announcement about its linguistic ambitions without also taking into account the recent challenges it has had with misinformation. In our blog article “Networks of Falsehood” we explored how misinformation is not managed by Facebook in other languages nearly as well as it is in English. This is even true for widely-spoken languages like Spanish. In one very clear example, Facebook banned the hashtag #plandemic in 2020, but waited almost a year after that before banning #plandemia, the Spanish version. It can’t be discounted that with the recent bad press around it’s handling of languages other than English, Meta would see now as a good time to publicize its plans for improving accessibility for smaller languages regardless of whether those plans are thorough and tangible, or aspirational and hypothetical. The timeline provided for these changes in their article was “in our lifetimes.”
Finally, even if everything Meta says it wants to do for translation does indeed come to pass, it bears mentioning that their announcement doesn’t address some of the most basic issues people have with machine translation on a daily basis. Today you can open google translate on a smart phone, speak into it, and have it translate almost immediately. When it works, it feels miraculous, almost like a universal translator from Star Trek. Just as often, though, the microphone will misunderstand a word you said, much in the way Siri or Alexa only understand properly 70-80% of the time. Then there’s the fact that errors can and do happen, and if there’s no multi-lingual humans around to set the record straight, misunderstandings will arise. Hopefully they’ll be comical as opposed to injurious. Just be sure not to use them in medical scenarios.